Prediction of Population Health Indices from Social Media using Kernel-based Textual and Temporal Features
نویسندگان
چکیده
From 1984, the US has annually conducted the Behavioral Risk Factor Surveillance System (BRFSS) surveys to capture either health behaviors, such as drinking or smoking, or health outcomes, including mental, physical, and generic health, of the population. Although this kind of information at a population level, such as US counties, is important for local governments to identify local needs, traditional datasets may take years to collate and to become publicly available. Geocoded social media data can provide an alternative reflection of local health trends. In this work, to predict the percentage of adults in a county reporting“insufficient sleep”, a health behavior, and, at the same time, their health outcomes, novel textual and temporal features are proposed. The proposed textual features are defined at midlevel and can be applied on top of various low-level textual features. They are computed via kernel functions on underlying features and encode the relationships between individual underlying features over a population. To further enrich the predictive ability of the health indices, the textual features are augmented with temporal information. We evaluated the proposed features and compared them with existing features using a dataset collected from the BRFSS. Experimental results show that the combination of kernel-based textual features and temporal information predict well both the health behavior (with best performance at rho=0.82) and health outcomes (with best performance at rho=0.78), demonstrating the capability of social media data in prediction of population health indices. The results also show that our proposed features gained higher correlation coeffi©2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. WWW 2017 Companion, April 3–7, 2017, Perth, Australia. ACM 978-1-4503-4914-7/17/04. http://dx.doi.org/10.1145/3041021.3054136 . cients than did the existing ones, increasing the correlation coefficient by up to 0.16, suggesting the potential of the approach in a wide spectrum of applications on data analytics at population levels.
منابع مشابه
Predicting IMDB Movie Ratings Using Social Media
We predict IMDb movie ratings and consider two sets of features: surface and textual features. For the latter, we assume that no social media signal is isolated and use data from multiple channels that are linked to a particular movie, such as tweets from Twitter and comments from YouTube. We extract textual features from each channel to use in our prediction model and we explore whether data f...
متن کاملPrediction of academic achievement motivation based on academic alienation, social support, media usage and demographic variables
Academic achievement motivation plays an important role in academic success, so its prediction and improvement is very important. This study investigated the relationship between academic achievement motivation and academic alienation, social support, media usage and demographic variables, and also predicted academic achievement motivation.study was correlational. Its statistical population inc...
متن کاملTemporal and Social Context Based Burst Detection from Folksonomies
Burst detection is an important topic in temporal stream analysis. Usually, only the textual features are used in burst detection. In the theme extraction from current prevailing social media content, it is necessary to consider not only textual features but also the pervasive collaborative context, e.g., resource lifetime and user activity. This paper explores novel approaches to combine multi...
متن کاملA Review of Spatial Factor Modeling Techniques in Recommending Point of Interest Using Location-based Social Network Information
The rapid growth of mobile phone technology and its combination with various technologies like GPS has added location context to social networks and has led to the formation of location-based social networks. In social networking sites, recommender systems are used to recommend points of interest (POIs) to users. Traditional recommender systems, such as film and book recommendations, have a lon...
متن کاملDetecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods
A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information – or patterns in general – about events emerging in ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017